Elasticsearch 搜索基本操作

2021年初报名了阿里云举办的elasticsearch 百人大作战,共同完成《ELK操作手册》的编写,有幸参与到了基础能力编写的部分-search基本操作,现整理部分内容展现给大家参考学习。

业务背景

在2B行业,对商品的搜索展示是有一定业务要求的,例如:存在合作关系的买家和供应商才能看到供应商店铺的商品,不存在合作关系的买家则不展示商品,另外,有些商品对客户甲展示一种价格,对另外一些客户则展示另外一种价格,从而区分不同的会员、分组对商品价格的区别。一句话总结:2B行业的商品销售具有一定封闭性、特殊性。后续例子均在此背景下展开描述,以方便大家更加贴近业务场景来熟悉elastic search 对文档、索引、查询的一系列操作。

定义mapping

商品字段描述如下:

goodsName: 商品名称
skuCode:商品sku编码
brandName:商品品牌名称
channelType:渠道类型
shopCode: 店铺编码
publicPrice:售卖价格(基础价,对所有人开放价格)
closeUserCode:封闭会员编码
groupPrice:分组价格,其中使用嵌套类型存储
boxLevelPrice:分组价格
level:分组级别

定义商品mapping
PUT my_goods_20210423
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "properties": {
      "goodsName": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "skuCode": {
        "type": "keyword"
      },
      "brandName": {
        "type": "keyword"
      },
      "channelType": {
        "type": "keyword"
      },
      "shopCode": {
        "type": "keyword"
      },
      "publicPrice": {
        "type": "float"
      },
      "closeUserCode": {
        "type": "text",
        "analyzer": "standard"
      },
      "boostValue": {
        "type": "keyword"
      },
      "groupPrice": {
        "type": "nested",
        "properties": {
          "boxLevelPrice": {
            "type": "float"
          },
          "level": {
            "type": "text"
          }
        }
      }
    }
  }
}

操作文档

主要涉及以下几个核心功能


document.png

1、新增

对文档的新增操作支持以下类型

PUT /<target>/_doc/<_id>

POST /<target>/_doc/

PUT /<target>/_create/<_id>

POST /<target>/_create/<_id>

以 POST /<target>/_create/<_id>为例,以下将创建文档ID为1的商品信息:

POST /my_goods_20210423/_create/1
{
    "goodsName":"苹果 51英寸 4K超高清",
    "skuCode":"skuCode1",
    "brandName":"苹果",
    "closeUserCode":[
        "0"
    ],
    "channelType":"cloudPlatform",
    "shopCode":"sc00001",
    "publicPrice":"8188.88",
    "groupPrice":null,
    "boxPrice":null,
    "boostValue":1.8
}

ES支持批量插入,_bulk桶插入

POST my_goods_20210423/_bulk
{"index":{"_id":3}}
{"goodsName":"苹果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山东苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}
{"index":{"_id":5}}
{"goodsName":"苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清","skuCode":"skuCode5","brandName":"三星苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2500"},{"level":"level2","boxLevelPrice":"3500"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":3588.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5588.88}],"boostValue":1.2}
{"index":{"_id":6}}
{"goodsName":"三星UA55RU7520JXXZ 51英寸 4K超高清","skuCode":"skuCode1","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.2}
{"index":{"_id":7}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd002"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":8}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":9}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":10}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.8}

2、删除

对文档的删除操作支持以下类型

DELETE /<index>/_doc/<_id>

删除文档ID为2的数据:

DELETE /my_goods_20210423/_doc/2

另外,删除操作支持带多种条件的删除,可以使用_delete_by_query,
如下操纵,将删除店铺编码为“sc00002”的所有商品

POST /my_goods_20210423/_delete_by_query
{
  "query": {
    "match": {
      "shopCode": "sc00002"
    }
  }
}

3、修改

对文档的修改操作支持以下类型

POST /<index>/_update/<_id>
修改文档ID为1的文档信息

新增字段

POST /my_goods_20210423/_update/1
{
  "doc": {
    "shopName": "小王店铺"
  }
}

修改店铺名称为:“张三店铺”

POST /my_goods_20210423/_update/1
{
  "doc": {
    "shopName": "张三店铺"
  }
}
{
  "goodsName" : "苹果 51英寸 4K超高清",
  "skuCode" : "skuCode1",
  "brandName" : "苹果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00001",
  "publicPrice" : "8188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.8,
  "shopName" : "张三店铺"
}

另外,更新操作还可以使用_update_by_query api,当店铺编码为"sc00002"时修改"publicPrice"为5888.00元
插入文档ID为2的店铺商品信息

POST /my_goods_20210423/_create/2
{"goodsName":"苹果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":"6188.88","groupPrice":null,"boxPrice":null,"boostValue":1.0}

此时查询返回:

{
  "goodsName" : "苹果 55英寸 3K超高清",
  "skuCode" : "skuCode2",
  "brandName" : "苹果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00002",
  "publicPrice" : "6188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.0
}

更新当店铺编码为"sc00002"时修改"publicPrice"为5888.00元

POST /my_goods_20210423/_update_by_query
{
  "script": {
    "source": "ctx._source.publicPrice=5888.00",
    "lang": "painless"
  },
  "query": {
    "term": {
      "shopCode": "sc00002"
    }
  }
}

再次查询结果

GET /my_goods_20210423/_source/2
{
  "shopCode" : "sc00002",
  "brandName" : "苹果",
  "closeUserCode" : [
    "0"
  ],
  "groupPrice" : null,
  "boxPrice" : null,
  "channelType" : "cloudPlatform",
  "boostValue" : 1.0,
  "publicPrice" : 5888.0,
  "goodsName" : "苹果 55英寸 3K超高清",
  "skuCode" : "skuCode2"
}

当有业务需要重建索引时需要用到_reindex api
索引的来源和目的地必须是已经存在的index,、index alias、或者data stream
你可以简单的将索引A reindex到索引B,当然也可以带条件的reindex到索引B
如下所示,将skuCode=skuCode2的商品信息reindex到索引my_goods_20210423_new中

POST _reindex
{
  "source": {
    "index": "my_goods_20210423",
    "query": {
      "match": {
        "skuCode": "skuCode2"
      }
    }
  },
  "dest": {
    "index": "my_goods_20210423_new"
  }
}

查询my_goods_20210423_new索引数据

GET my_goods_20210423_new/_search/
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_goods_20210423_new",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cmccPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "htd002"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_20210423_new",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_20210423_new",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_20210423_new",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.8
        }
      }
    ]
  }
}

4、查询

对文档的查询操作支持以下类型

GET <index>/_doc/<_id>

HEAD <index>/_doc/<_id>

GET <index>/_source/<_id>

HEAD <index>/_source/<_id>
查询文档ID为1的文档信息
GET /my_goods_20210423/_doc/1
查询文档ID为1的文档是否存在

只判断文档是否存在,head返回的信息更少、性能更高,满足特殊业务场景使用

HEAD /my_goods_20210423/_doc/1

返回:

200 - OK
只返回文档信息

查询时只返回_source信息

GET /my_goods_20210423/_source/1

返回:

{
  "goodsName" : "苹果 51英寸 4K超高清",
  "skuCode" : "skuCode1",
  "brandName" : "苹果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00001",
  "publicPrice" : "8188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.8
}

定制化返回参数

只获取_source部分参数,类似数据库查询中的指定字段,而不是select * 返回所有字段

GET my_goods_20210423/_source/1/?_source_includes=brandName,goodsName

返回:

{
  "brandName" : "苹果",
  "goodsName" : "苹果 51英寸 4K超高清"
}
查询文档ID为1的文档是否存在

只判断文档是否存在,head返回的信息更少、性能更高,满足特殊业务场景使用

HEAD /my_goods_20210423/_doc/1

返回:

200 - OK
批量查询

ES同时支持批量查询,需要使用_mget API,查询文档ID等于1和2的文档信息

GET /my_goods_20210423/_mget
{
  "docs": [
    {
      "_id": "1"
    },
    {
      "_id": "2"
    }
  ]
}

返回:

{
  "docs" : [
    {
      "_index" : "my_goods_20210423",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 7,
      "_seq_no" : 8,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "goodsName" : "苹果 51英寸 4K超高清",
        "skuCode" : "skuCode1",
        "brandName" : "苹果",
        "closeUserCode" : [
          "0"
        ],
        "channelType" : "cloudPlatform",
        "shopCode" : "sc00001",
        "publicPrice" : "8188.88",
        "groupPrice" : null,
        "boxPrice" : null,
        "boostValue" : 1.8,
        "shopName" : "张三店铺"
      }
    },
    {
      "_index" : "my_goods_20210423",
      "_type" : "_doc",
      "_id" : "2",
      "found" : false
    }
  ]
}

Query DSL

查询索引包括全文本查询、组合查询、结构化查询等,主要分为query与filter查询。
2者查询是有区别的:

  1. query查询,用于解答文档是否存在并且告知返回文档与查询条件的匹配度,返回_score评分供用户选择
  2. filter查询,只用于返回文档是否与查询匹配,但是不会告诉你匹配度,在做聚合查询时filter经常发挥更大的作用,因为没有评分ES的处理速度就会提高,提升了整体响应时间。同时filter可以缓存查询结果,而query则不能缓存
使用场景:如果涉及到全文检索以及评分相关业务使用query,其他场景推荐使用filter查询

组合查询

image.png
boolean查询

boolean 查询包含must、filter、should、must_not
must为必须匹配并且返回评分,filter忽略评分,should相当于数据库查询中的or,must_not 为不匹配,相当于不等于
查询:店铺编码=sc00001 且渠道channelType=cloudPlatform 且publicPrice价格区间不在8288-8888之间或者品牌包含苹果

POST /my_goods_20210423/_search
{
  "query": {
    "bool": {
      "must": {
        "term":{
          "shopCode":"sc00001"
        }
      },
      "filter": {
        "term": {
          "channelType": "cloudPlatform"
        }
      },
      "must_not": [
        {
         "range": {
           "publicPrice": {
             "gte": 8288,
             "lte": 8888
           }
         }
        }
      ],
      "should": [
        {
          "term": {
            "brandName": {
              "value": "苹果"
            }
          }
        }
      ],
      "minimum_should_match" : 1
    }
  }
}
boosting 查询

boosting用于控制评分相关度相关,可以提升评分也可以降低评分

POST /my_goods_20210423/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "skuCode": {
            "value": "skuCode1"
          }
        }
      },
      "negative": {
        "term": {
          "goodsName": {
            "value": "三星"
          }
        }
      }, 
      "negative_boost": 1
    }
  }
}

此时设置的negative_boost=1,不提升也不降低,返回:

"hits" : [
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3862942,
        "_source" : {
          "goodsName" : "苹果 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "苹果",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.8,
          "shopName" : "张三店铺"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.3862942,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "三星",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cmccPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.2
        }
      }
    ]

可以看到2条文档记录评分一致:"_score" : 1.3862942

当我们修改 "negative_boost": 0.2时,此时返回(省略部分无关字段):

"hits" : [
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3862942,
        "_source" : {
          "goodsName" : "苹果 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "苹果",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.8,
          "shopName" : "张三店铺"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.27725884,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "三星",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cmccPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.2
        }
      }
    ]

此时发现文档ID=6的评分下降到_score" : 0.27725884,因为在negative命中了查询条件,negative_boost在0到1之间时,用于降低评分,相反,大于1用于提升评分

Constant score query 查询

当查询不关心TF(词频)时,就可以使用constant score query

POST /my_goods_20210423/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "goodsName": "苹果"
        }
      },
      "boost": 1.2
    }
  }
}

返回(省略部分无关字段):

{
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.2,
        "_source" : {
          "goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.2,
        "_source" : {
          "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清"
        }
      }
}

可以看到,文档ID=3的评分和文档ID=4的评分一样,但是ID=4的匹配相关度更高,这是由于我们忽略了词频对打分的影响。

Disjunction max query 查询

Disjunction 查询也被理解为分离最大化查询,指的是: 将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回,例如查询商品名称和品牌名称中包含“苹果”的信息,当品牌的评分高于商品名称时,则返回品牌的评分做为总评分(忽略tie_breaker缓冲)

GET /my_goods_20210423/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7,
      "boost": 1.2,
      "queries": [
        {
          "term": {
            "goodsName": {
              "value": "苹果"
            }
          }
        },
        {
          "term": {
            "brandName": {
              "value": "苹果"
            }
          }
        }
        ]
    }
  }
}

返回结果(忽略无关字段):

"max_score" : 3.0150018,
    "hits" : [
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 3.0150018,
        "_source" : {
          "goodsName" : "苹果 51英寸 4K超高清",
          "brandName" : "苹果"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.3465583,
        "_source" : {
          "goodsName" : "苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清",
          "brandName" : "三星苹果"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.2337791,
        "_source" : {
          "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
          "brandName" : "山东苹果"
        }
      },

分析:

  1. id=1的记录,由于品牌只包含“苹果”2字,ES认为这种匹配度更高,所以此条记录评分排在第一位
  2. id=5的记录,由于品牌中和ID=4的记录都包含苹果且字数一样,此时就要看goodsName包含苹果的词频数量了,ID=5的品牌中,“苹果”出现了3次,而ID=4的值出现了2次,所以评分没有ID=5的高,符合我们的预期结果。
  3. tie_breaker字段做什么用呢?它是起到了缓冲的作用(取值范围:0到1之间),Disjunction查询会将匹配度最高的字段得分做为整个文档的得分返回,这种情况其他字段就不起作用了,难免有点走极端,此时就需要tie_breaker来做缓存,提升其他字段的影响力,最终的结果:brandName评分+goodsName评分*tie_breaker。作为总评分返回
Function score query 查询

Function score 允许你控制查询评分,是用来控制评分过程的终极武器。最高效的用法是用过滤器对结果的子集应用不同的函数,同时运用了filter的缓存并且达到了控制评分的过程。
我们想让山东的苹果搜索出现美国苹果之前,查询商品名称包含“苹果”,当品牌中包含“美国”时,权重设置为2,当出现“山东”时,权重设置为40

GET /my_goods_20210423/_search
{
  "query": {
    "function_score": {
      "query": {
        "term": {
          "goodsName": {
            "value": "苹果"
          }
        }
      },
      "boost": 2, 
      "functions": [
        {
          "filter": {
            "match":{
              "brandName":"美国"
            }
          },
          "random_score": {
            
          },
          "weight": 2
        },
        {
          "filter": {
            "match":{
              "brandName":"山东"
            }
          },
          "weight": 40
        }
      ],
      "max_boost": 60,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 2
    }
  }
}

返回主要信息:

    "max_score" : 2.2442641,
    "hits" : [
     {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 2.0562985,
        "_source" : {
          "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
          "brandName" : "山东苹果"
        }
      },
      {
        "_index" : "my_goods_20210423",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.7582327,
        "_source" : {
          "goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
          "brandName" : "美国苹果",
        }
      }
    ]

解释几个参数:

  1. score_mode
    multiply:默认,分数相乘
    sum:分数求和
    avg:平均分数
    first:第一个 function的分数
    max:使用评分最大的分数
    min:使用评分最小的分数

avg举例,如果2个函数返回的分数为1和2,并且它们的权重分别为3和4,则他们的评分为:(13+24)/(3+4)
其他详解请参考官方score-functions详解

全文检索

全文检索.png
match 查询

match 查询是一种标准的查询,示例如下:

GET /my_goods_20210423/_search
{
  "query": {
    "match": {
      "goodsName": "苹果 高清 英寸"
    }
  }
}

match查询是一种boolean类型的查询,可以使用"operator"来控制boolean 字句,operator包含 and 和 or(默认为 or)

GET /my_goods_20210423/_search
{
  "query": {
    "match": {
      "goodsName": {
        "query": "苹果 高清 英寸",
        "operator": "and"
      }
    }
  }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

命中为0,因为没有标题中包含“苹果 高清 英寸”词组的商品信息

match boolean prefix query

添加2条商品名称是因为的测试数据,方便测试

POST my_goods_20210423/_bulk
{"index":{"_id":11}}
{"goodsName":"apple goods test","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":12}}
{"goodsName":"apple goods online","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
GET /my_goods_20210423/_search
{
  "query": {
    "match_bool_prefix": {
      "goodsName": "apple goods t"
    }
  }
}

2条刚添加的商品都被查询到了,match_bool_prefix原理就相当于把词组分开后的boolean查询,转换后类似如下查询:

GET /my_goods_20210423/_search
{
  "query": {
    "bool" : {
      "should": [
        { "term": { "goodsName": "apple" }},
        { "term": { "goodsName": "goods" }},
        { "prefix": { "goodsName": "t"}}
      ]
    }
  }
}
match prefix query

用于匹配索引中是否存在所输入的查询条件数据

GET /my_goods_20210423/_search
{
  "query": {
    "match_phrase": {
      "goodsName": "apple"
    }
  }
}

比较match_phrase与match区别,match_phrase会将查询条件的中的信息看做一个整体,不做分词去查询,当然你也可以指定分词类型,而match会将查询中的条件做分词处理后,再去做查询

#查询不到任何数据,因为不存在'goods t'的词组
GET /my_goods_20210423/_search
{
  "query": {
    "match_phrase": {
      "goodsName": "goods t"
    }
  }
}
#能查询到数据,因为文档中包含goods和t的词组
GET /my_goods_20210423/_search
{
  "query": {
    "match": {
      "goodsName": "goods t"
    }
  }
}
match phrase prefix query

返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容且是按照顺序的,如"apple goods t" ,商品名称包含"apple goods test"的数据将被查询到返回。
新增一条测试数据

POST my_goods_20210423/_bulk
{"index":{"_id":13}}
{"goodsName":"apple and goods product ","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
#只返回goodsName : apple goods test的数据
GET /my_goods_20210423/_search
{
  "query": {
    "match_phrase_prefix": {
      "goodsName": "apple goods t"
    }
  }
}
总结比较match这四种查询
match比较.png
Multi-match

多字段匹配,可以在多个字段中匹配查询相关信息,通过type参数可以调整结果集

#查询商品名称和品牌名称中包含苹果的文档信息
POST /my_goods_20210423/_search
{
  "query": {
    "multi_match": {
      "query": "苹果",
      "type": "best_fields", 
      "fields": ["goodsName","brandName"],
      "tie_breaker": 0.3
    }
  }
}

type参数类型详解:

  • best_fields :默认,匹配fields,将评分最高的分数做为整个查询的分数返回
  • most_fields:查询匹配的文档,并且返回各个字段的分数之和的平均值
  • cross_fields:跨字段匹配,匹配多个字段中是否包含查询词组
  • phrase:以match_phrase方式运行查询,并返回最佳匹配的评分做为总评分
  • phrase_prefix:以match_phrase_prefix方式运行查询,并返回最佳匹配的评分做为总评分
  • bool_prefix:在每个字段上运行match_bool_prefix查询并组合每个字段的评分,详情参考bool_prefix
    cross_fields为例进行实战讲解
#插入测试数据
PUT my_shop
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "firstName":{
        "type":"text"
      },
      "lastName":{
        "type":"text"
      }
    }
  }
}
POST my_shop/_bulk
{"index":{"_id":1}}
{"first_name":"Will","last_name":"Smith","age":25}
{"index":{"_id":2}}
{"first_name":"Smith","last_name":"hello","age":21}
{"index":{"_id":3}}
{"first_name":"Will","last_name":"hello","age":20}

#查询姓名为Will Smith的信息
GET /my_shop/_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "cross_fields",
      "fields":     [ "first_name^2", "last_name" ],
      "operator":   "and"
    }
  }
}
#返回
"max_score" : 1.9208363,
    "hits" : [
      {
        "_index" : "my_shop",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9208363,
        "_source" : {
          "first_name" : "Will",
          "last_name" : "Smith",
          "age" : 25
        }
      }
    ]

另外,first_name提升了权重,默认为1

Term-level查询

可以使用term-level 查询结构化数据,结构化数据如日期范围、IP地址、价格等,下面分别演示在业务场景中的实际使用

  • exists查询
    返回包含字段索引值的文档
#返回包含goodsName字段的索引文档
GET /my_goods_20210423/_search
{
  "query": {
    "exists": {
      "field": "goodsName"
    }
  }
}
  • fuzzy查询
    返回包含与搜索字词相似的字词的文档,可以用于查询纠错功能
#以官网例子举例说明
POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "text": "Surprise me!"}
{ "index": { "_id": 2 }}
{ "text": "That was surprising."}
{ "index": { "_id": 3 }}
{ "text": "I wasn't surprised."}

GET /my_index/_search
{
  "query": {
    "fuzzy": {
      "text": {
        "value": "surprize",
        "prefix_length": 1
      }
    }
  }
}
#发挥
"hits" : [
      {
        "_index" : "my_index",
        "_type" : "my_type",
        "_id" : "1",
        "_score" : 0.9559981,
        "_source" : {
          "text" : "Surprise me!"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "my_type",
        "_id" : "3",
        "_score" : 0.69983494,
        "_source" : {
          "text" : "I wasn't surprised."
        }
      }

默认如果不设置,prefix_length就是2

  1. surprising 错误3个位置,不能纠错
  2. surprize 拼写错误,s->z,错误在一个位置,在2个位置的纠错范围之内
    为提高性能,可以设置max_expansions,将限制产生模糊文档的个数,
    另外,prefix_length不宜设置过大,也将影响查询性能,同时错误过多也将导致查询结果不是用户期望的。
  • ids查询
    范围文档包含ID的文档信息
GET /my_goods_20210423/_search
{
  "query": {
    "ids" : {
      "values" : ["1", "4", "5"]
    }
  }
}
  • prefix查询
    返回在提供的字段中包含特定前缀的文档
GET /my_shop_test/_search
{
  "query": {
    "prefix": {
      "shopName": {
        "value": "bo"
      }
    }
  }
}
#返回
"hits" : [
      {
        "_index" : "my_shop_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "shopName" : "box",
          "shopCode" : "Smith"
        }
      },
      {
        "_index" : "my_shop_test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "shopName" : "booex",
          "shopCode" : "act"
        }
      }
    ]
  • range查询
    rand查询类似数据库中的 大于、小于范围查询
GET my_goods_20210423/_search
{
  "query": {
    "range": {
      "publicPrice": {
        "gte": 2000,
        "lte": 8488
      }
    }
  }
}
  1. gt :大于
  2. gte:大于等于
  3. lt:小于
  4. lte:小于等于
  • regexp查询
    正则表达式查询,查询店铺编码以's'开头,中间包括任何字符以及长度并且以'1'结尾的数据
GET my_goods_20210423/_search
{
  "query": {
    "regexp": {
      "shopCode": {
        "value": "s.*1",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score"
      }
    }
  }
}
  • term查询
#返回确切的文档内容,避免对text字段类型使用term
GET my_goods_20210423/_search
{
  "query": {
    "term": {
      "brandName": {
        "value": "三星",
        "boost": 1.0
      }
    }
  }
}
  • terms查询
    terms返回一个或多个包含精确查询条件的文档信息
GET /my_goods_20210423/_search
{
  "query": {
    "terms": {
      "brandName": [ "美国", "三星" ],
      "boost": 1.0
    }
  }
}
  • terms_set查询
    返回最小精确匹配成功的文档信息,terms_set类似terms 查询,只不过terms_se多定义了返回最小匹配的数量
#新定义商品信息
PUT /my_goods_info
{
  "mappings": {
    "properties": {
      "goodsName": {
        "type": "keyword"
      },
      "sale_property": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}

#添加3条商品测试数据
#销售属性 白色、64G、标品
PUT /my_goods_info/_doc/1?refresh
{
  "name": "apple",
  "sale_property": [ "white", "64","standard" ],
  "required_matches": 2
}
#黑色、32G、非标品
PUT /my_goods_info/_doc/2?refresh
{
  "name": "apple",
  "sale_property": [ "black", "32","no standard" ],
  "required_matches": 2
}
#黑色、64 非标品
PUT /my_goods_info/_doc/3?refresh
{
  "name": "apple",
  "sale_property": [ "black", "64","no standard" ],
  "required_matches": 2
}
#查询
GET /my_goods_info/_search
{
  "query": {
    "terms_set": {
      "sale_property": {
        "terms": [ "white", "64"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}
#返回
"hits" : [
      {
        "_index" : "my_goods_info",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1149836,
        "_source" : {
          "name" : "apple",
          "sale_property" : [
            "white",
            "64",
            "standard"
          ],
          "required_matches" : 2
        }
      }
    ]
  • wildcard查询
    返回包含与通配符模式匹配的术语的文档
#返回
GET /my_goods_20210423/_search
{
  "query": {
    "wildcard": {
      "shopCode": {
        "value": "sc*1",
        "boost": 1.0,
        "rewrite": "constant_score"
      }
    }
  }
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,992评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,212评论 3 388
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,535评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,197评论 1 287
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,310评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,383评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,409评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,191评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,621评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,910评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,084评论 1 342
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,763评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,403评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,083评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,318评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,946评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,967评论 2 351

推荐阅读更多精彩内容