2021年初报名了阿里云举办的elasticsearch 百人大作战,共同完成《ELK操作手册》的编写,有幸参与到了基础能力编写的部分-search基本操作,现整理部分内容展现给大家参考学习。
业务背景
在2B行业,对商品的搜索展示是有一定业务要求的,例如:存在合作关系的买家和供应商才能看到供应商店铺的商品,不存在合作关系的买家则不展示商品,另外,有些商品对客户甲展示一种价格,对另外一些客户则展示另外一种价格,从而区分不同的会员、分组对商品价格的区别。一句话总结:2B行业的商品销售具有一定封闭性、特殊性。后续例子均在此背景下展开描述,以方便大家更加贴近业务场景来熟悉elastic search 对文档、索引、查询的一系列操作。
定义mapping
商品字段描述如下:
goodsName: 商品名称
skuCode:商品sku编码
brandName:商品品牌名称
channelType:渠道类型
shopCode: 店铺编码
publicPrice:售卖价格(基础价,对所有人开放价格)
closeUserCode:封闭会员编码
groupPrice:分组价格,其中使用嵌套类型存储
boxLevelPrice:分组价格
level:分组级别
定义商品mapping
PUT my_goods_20210423
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"goodsName": {
"type": "text",
"analyzer": "ik_smart"
},
"skuCode": {
"type": "keyword"
},
"brandName": {
"type": "keyword"
},
"channelType": {
"type": "keyword"
},
"shopCode": {
"type": "keyword"
},
"publicPrice": {
"type": "float"
},
"closeUserCode": {
"type": "text",
"analyzer": "standard"
},
"boostValue": {
"type": "keyword"
},
"groupPrice": {
"type": "nested",
"properties": {
"boxLevelPrice": {
"type": "float"
},
"level": {
"type": "text"
}
}
}
}
}
}
操作文档
主要涉及以下几个核心功能
1、新增
对文档的新增操作支持以下类型
PUT /<target>/_doc/<_id>
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>
以 POST /<target>/_create/<_id>为例,以下将创建文档ID为1的商品信息:
POST /my_goods_20210423/_create/1
{
"goodsName":"苹果 51英寸 4K超高清",
"skuCode":"skuCode1",
"brandName":"苹果",
"closeUserCode":[
"0"
],
"channelType":"cloudPlatform",
"shopCode":"sc00001",
"publicPrice":"8188.88",
"groupPrice":null,
"boxPrice":null,
"boostValue":1.8
}
ES支持批量插入,_bulk桶插入
POST my_goods_20210423/_bulk
{"index":{"_id":3}}
{"goodsName":"苹果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山东苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}
{"index":{"_id":5}}
{"goodsName":"苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清","skuCode":"skuCode5","brandName":"三星苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2500"},{"level":"level2","boxLevelPrice":"3500"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":3588.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5588.88}],"boostValue":1.2}
{"index":{"_id":6}}
{"goodsName":"三星UA55RU7520JXXZ 51英寸 4K超高清","skuCode":"skuCode1","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.2}
{"index":{"_id":7}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd002"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":8}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":9}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":10}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.8}
2、删除
对文档的删除操作支持以下类型
DELETE /<index>/_doc/<_id>
删除文档ID为2的数据:
DELETE /my_goods_20210423/_doc/2
另外,删除操作支持带多种条件的删除,可以使用_delete_by_query,
如下操纵,将删除店铺编码为“sc00002”的所有商品
POST /my_goods_20210423/_delete_by_query
{
"query": {
"match": {
"shopCode": "sc00002"
}
}
}
3、修改
对文档的修改操作支持以下类型
POST /<index>/_update/<_id>
修改文档ID为1的文档信息
新增字段
POST /my_goods_20210423/_update/1
{
"doc": {
"shopName": "小王店铺"
}
}
修改店铺名称为:“张三店铺”
POST /my_goods_20210423/_update/1
{
"doc": {
"shopName": "张三店铺"
}
}
{
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
另外,更新操作还可以使用_update_by_query api,当店铺编码为"sc00002"时修改"publicPrice"为5888.00元
插入文档ID为2的店铺商品信息
POST /my_goods_20210423/_create/2
{"goodsName":"苹果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":"6188.88","groupPrice":null,"boxPrice":null,"boostValue":1.0}
此时查询返回:
{
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00002",
"publicPrice" : "6188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.0
}
更新当店铺编码为"sc00002"时修改"publicPrice"为5888.00元
POST /my_goods_20210423/_update_by_query
{
"script": {
"source": "ctx._source.publicPrice=5888.00",
"lang": "painless"
},
"query": {
"term": {
"shopCode": "sc00002"
}
}
}
再次查询结果
GET /my_goods_20210423/_source/2
{
"shopCode" : "sc00002",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"groupPrice" : null,
"boxPrice" : null,
"channelType" : "cloudPlatform",
"boostValue" : 1.0,
"publicPrice" : 5888.0,
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2"
}
当有业务需要重建索引时需要用到_reindex api
索引的来源和目的地必须是已经存在的index,、index alias、或者data stream
你可以简单的将索引A reindex到索引B,当然也可以带条件的reindex到索引B
如下所示,将skuCode=skuCode2的商品信息reindex到索引my_goods_20210423_new中
POST _reindex
{
"source": {
"index": "my_goods_20210423",
"query": {
"match": {
"skuCode": "skuCode2"
}
}
},
"dest": {
"index": "my_goods_20210423_new"
}
}
查询my_goods_20210423_new索引数据
GET my_goods_20210423_new/_search/
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_goods_20210423_new",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"0"
],
"channelType" : "cmccPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"htd002"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_20210423_new",
"_type" : "_doc",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_20210423_new",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_20210423_new",
"_type" : "_doc",
"_id" : "10",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.8
}
}
]
}
}
4、查询
对文档的查询操作支持以下类型
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
查询文档ID为1的文档信息
GET /my_goods_20210423/_doc/1
查询文档ID为1的文档是否存在
只判断文档是否存在,head返回的信息更少、性能更高,满足特殊业务场景使用
HEAD /my_goods_20210423/_doc/1
返回:
200 - OK
只返回文档信息
查询时只返回_source信息
GET /my_goods_20210423/_source/1
返回:
{
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8
}
定制化返回参数
只获取_source部分参数,类似数据库查询中的指定字段,而不是select * 返回所有字段
GET my_goods_20210423/_source/1/?_source_includes=brandName,goodsName
返回:
{
"brandName" : "苹果",
"goodsName" : "苹果 51英寸 4K超高清"
}
查询文档ID为1的文档是否存在
只判断文档是否存在,head返回的信息更少、性能更高,满足特殊业务场景使用
HEAD /my_goods_20210423/_doc/1
返回:
200 - OK
批量查询
ES同时支持批量查询,需要使用_mget API,查询文档ID等于1和2的文档信息
GET /my_goods_20210423/_mget
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
返回:
{
"docs" : [
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"_seq_no" : 8,
"_primary_term" : 1,
"found" : true,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}
Query DSL
查询索引包括全文本查询、组合查询、结构化查询等,主要分为query与filter查询。
2者查询是有区别的:
- query查询,用于解答文档是否存在并且告知返回文档与查询条件的匹配度,返回_score评分供用户选择
- filter查询,只用于返回文档是否与查询匹配,但是不会告诉你匹配度,在做聚合查询时filter经常发挥更大的作用,因为没有评分ES的处理速度就会提高,提升了整体响应时间。同时filter可以缓存查询结果,而query则不能缓存
使用场景:如果涉及到全文检索以及评分相关业务使用query,其他场景推荐使用filter查询
组合查询
boolean查询
boolean 查询包含must、filter、should、must_not
must为必须匹配并且返回评分,filter忽略评分,should相当于数据库查询中的or,must_not 为不匹配,相当于不等于
查询:店铺编码=sc00001 且渠道channelType=cloudPlatform 且publicPrice价格区间不在8288-8888之间或者品牌包含苹果
POST /my_goods_20210423/_search
{
"query": {
"bool": {
"must": {
"term":{
"shopCode":"sc00001"
}
},
"filter": {
"term": {
"channelType": "cloudPlatform"
}
},
"must_not": [
{
"range": {
"publicPrice": {
"gte": 8288,
"lte": 8888
}
}
}
],
"should": [
{
"term": {
"brandName": {
"value": "苹果"
}
}
}
],
"minimum_should_match" : 1
}
}
}
boosting 查询
boosting用于控制评分相关度相关,可以提升评分也可以降低评分
POST /my_goods_20210423/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"skuCode": {
"value": "skuCode1"
}
}
},
"negative": {
"term": {
"goodsName": {
"value": "三星"
}
}
},
"negative_boost": 1
}
}
}
此时设置的negative_boost=1,不提升也不降低,返回:
"hits" : [
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.3862942,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.3862942,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "三星",
"closeUserCode" : [
"0"
],
"channelType" : "cmccPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.2
}
}
]
可以看到2条文档记录评分一致:"_score" : 1.3862942
当我们修改 "negative_boost": 0.2时,此时返回(省略部分无关字段):
"hits" : [
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.3862942,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.27725884,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "三星",
"closeUserCode" : [
"0"
],
"channelType" : "cmccPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.2
}
}
]
此时发现文档ID=6的评分下降到_score" : 0.27725884,因为在negative命中了查询条件,negative_boost在0到1之间时,用于降低评分,相反,大于1用于提升评分
Constant score query 查询
当查询不关心TF(词频)时,就可以使用constant score query
POST /my_goods_20210423/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"goodsName": "苹果"
}
},
"boost": 1.2
}
}
}
返回(省略部分无关字段):
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.2,
"_source" : {
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.2,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清"
}
}
}
可以看到,文档ID=3的评分和文档ID=4的评分一样,但是ID=4的匹配相关度更高,这是由于我们忽略了词频对打分的影响。
Disjunction max query 查询
Disjunction 查询也被理解为分离最大化查询,指的是: 将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回,例如查询商品名称和品牌名称中包含“苹果”的信息,当品牌的评分高于商品名称时,则返回品牌的评分做为总评分(忽略tie_breaker缓冲)
GET /my_goods_20210423/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"term": {
"goodsName": {
"value": "苹果"
}
}
},
{
"term": {
"brandName": {
"value": "苹果"
}
}
}
]
}
}
}
返回结果(忽略无关字段):
"max_score" : 3.0150018,
"hits" : [
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0150018,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"brandName" : "苹果"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3465583,
"_source" : {
"goodsName" : "苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清",
"brandName" : "三星苹果"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.2337791,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"brandName" : "山东苹果"
}
},
分析:
- id=1的记录,由于品牌只包含“苹果”2字,ES认为这种匹配度更高,所以此条记录评分排在第一位
- id=5的记录,由于品牌中和ID=4的记录都包含苹果且字数一样,此时就要看goodsName包含苹果的词频数量了,ID=5的品牌中,“苹果”出现了3次,而ID=4的值出现了2次,所以评分没有ID=5的高,符合我们的预期结果。
- tie_breaker字段做什么用呢?它是起到了缓冲的作用(取值范围:0到1之间),Disjunction查询会将匹配度最高的字段得分做为整个文档的得分返回,这种情况其他字段就不起作用了,难免有点走极端,此时就需要tie_breaker来做缓存,提升其他字段的影响力,最终的结果:brandName评分+goodsName评分*tie_breaker。作为总评分返回
Function score query 查询
Function score 允许你控制查询评分,是用来控制评分过程的终极武器。最高效的用法是用过滤器对结果的子集应用不同的函数,同时运用了filter的缓存并且达到了控制评分的过程。
我们想让山东的苹果搜索出现美国苹果之前,查询商品名称包含“苹果”,当品牌中包含“美国”时,权重设置为2,当出现“山东”时,权重设置为40
GET /my_goods_20210423/_search
{
"query": {
"function_score": {
"query": {
"term": {
"goodsName": {
"value": "苹果"
}
}
},
"boost": 2,
"functions": [
{
"filter": {
"match":{
"brandName":"美国"
}
},
"random_score": {
},
"weight": 2
},
{
"filter": {
"match":{
"brandName":"山东"
}
},
"weight": 40
}
],
"max_boost": 60,
"score_mode": "max",
"boost_mode": "multiply",
"min_score": 2
}
}
}
返回主要信息:
"max_score" : 2.2442641,
"hits" : [
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "4",
"_score" : 2.0562985,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"brandName" : "山东苹果"
}
},
{
"_index" : "my_goods_20210423",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.7582327,
"_source" : {
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
"brandName" : "美国苹果",
}
}
]
解释几个参数:
- score_mode
multiply:默认,分数相乘
sum:分数求和
avg:平均分数
first:第一个 function的分数
max:使用评分最大的分数
min:使用评分最小的分数
avg举例,如果2个函数返回的分数为1和2,并且它们的权重分别为3和4,则他们的评分为:(13+24)/(3+4)
其他详解请参考官方score-functions详解
全文检索
match 查询
match 查询是一种标准的查询,示例如下:
GET /my_goods_20210423/_search
{
"query": {
"match": {
"goodsName": "苹果 高清 英寸"
}
}
}
match查询是一种boolean类型的查询,可以使用"operator"来控制boolean 字句,operator包含 and 和 or(默认为 or)
GET /my_goods_20210423/_search
{
"query": {
"match": {
"goodsName": {
"query": "苹果 高清 英寸",
"operator": "and"
}
}
}
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
命中为0,因为没有标题中包含“苹果 高清 英寸”词组的商品信息
match boolean prefix query
添加2条商品名称是因为的测试数据,方便测试
POST my_goods_20210423/_bulk
{"index":{"_id":11}}
{"goodsName":"apple goods test","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":12}}
{"goodsName":"apple goods online","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
GET /my_goods_20210423/_search
{
"query": {
"match_bool_prefix": {
"goodsName": "apple goods t"
}
}
}
2条刚添加的商品都被查询到了,match_bool_prefix原理就相当于把词组分开后的boolean查询,转换后类似如下查询:
GET /my_goods_20210423/_search
{
"query": {
"bool" : {
"should": [
{ "term": { "goodsName": "apple" }},
{ "term": { "goodsName": "goods" }},
{ "prefix": { "goodsName": "t"}}
]
}
}
}
match prefix query
用于匹配索引中是否存在所输入的查询条件数据
GET /my_goods_20210423/_search
{
"query": {
"match_phrase": {
"goodsName": "apple"
}
}
}
比较match_phrase与match区别,match_phrase会将查询条件的中的信息看做一个整体,不做分词去查询,当然你也可以指定分词类型,而match会将查询中的条件做分词处理后,再去做查询
#查询不到任何数据,因为不存在'goods t'的词组
GET /my_goods_20210423/_search
{
"query": {
"match_phrase": {
"goodsName": "goods t"
}
}
}
#能查询到数据,因为文档中包含goods和t的词组
GET /my_goods_20210423/_search
{
"query": {
"match": {
"goodsName": "goods t"
}
}
}
match phrase prefix query
返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容且是按照顺序的,如"apple goods t" ,商品名称包含"apple goods test"的数据将被查询到返回。
新增一条测试数据
POST my_goods_20210423/_bulk
{"index":{"_id":13}}
{"goodsName":"apple and goods product ","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
#只返回goodsName : apple goods test的数据
GET /my_goods_20210423/_search
{
"query": {
"match_phrase_prefix": {
"goodsName": "apple goods t"
}
}
}
总结比较match这四种查询
Multi-match
多字段匹配,可以在多个字段中匹配查询相关信息,通过type参数可以调整结果集
#查询商品名称和品牌名称中包含苹果的文档信息
POST /my_goods_20210423/_search
{
"query": {
"multi_match": {
"query": "苹果",
"type": "best_fields",
"fields": ["goodsName","brandName"],
"tie_breaker": 0.3
}
}
}
type参数类型详解:
- best_fields :默认,匹配fields,将评分最高的分数做为整个查询的分数返回
- most_fields:查询匹配的文档,并且返回各个字段的分数之和的平均值
- cross_fields:跨字段匹配,匹配多个字段中是否包含查询词组
- phrase:以match_phrase方式运行查询,并返回最佳匹配的评分做为总评分
- phrase_prefix:以match_phrase_prefix方式运行查询,并返回最佳匹配的评分做为总评分
- bool_prefix:在每个字段上运行match_bool_prefix查询并组合每个字段的评分,详情参考bool_prefix
以cross_fields为例进行实战讲解
#插入测试数据
PUT my_shop
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"firstName":{
"type":"text"
},
"lastName":{
"type":"text"
}
}
}
}
POST my_shop/_bulk
{"index":{"_id":1}}
{"first_name":"Will","last_name":"Smith","age":25}
{"index":{"_id":2}}
{"first_name":"Smith","last_name":"hello","age":21}
{"index":{"_id":3}}
{"first_name":"Will","last_name":"hello","age":20}
#查询姓名为Will Smith的信息
GET /my_shop/_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first_name^2", "last_name" ],
"operator": "and"
}
}
}
#返回
"max_score" : 1.9208363,
"hits" : [
{
"_index" : "my_shop",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9208363,
"_source" : {
"first_name" : "Will",
"last_name" : "Smith",
"age" : 25
}
}
]
另外,first_name提升了权重,默认为1
Term-level查询
可以使用term-level 查询结构化数据,结构化数据如日期范围、IP地址、价格等,下面分别演示在业务场景中的实际使用
- exists查询
返回包含字段索引值的文档
#返回包含goodsName字段的索引文档
GET /my_goods_20210423/_search
{
"query": {
"exists": {
"field": "goodsName"
}
}
}
- fuzzy查询
返回包含与搜索字词相似的字词的文档,可以用于查询纠错功能
#以官网例子举例说明
POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "text": "Surprise me!"}
{ "index": { "_id": 2 }}
{ "text": "That was surprising."}
{ "index": { "_id": 3 }}
{ "text": "I wasn't surprised."}
GET /my_index/_search
{
"query": {
"fuzzy": {
"text": {
"value": "surprize",
"prefix_length": 1
}
}
}
}
#发挥
"hits" : [
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "1",
"_score" : 0.9559981,
"_source" : {
"text" : "Surprise me!"
}
},
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "3",
"_score" : 0.69983494,
"_source" : {
"text" : "I wasn't surprised."
}
}
默认如果不设置,prefix_length就是2
- surprising 错误3个位置,不能纠错
- surprize 拼写错误,s->z,错误在一个位置,在2个位置的纠错范围之内
为提高性能,可以设置max_expansions,将限制产生模糊文档的个数,
另外,prefix_length不宜设置过大,也将影响查询性能,同时错误过多也将导致查询结果不是用户期望的。
- ids查询
范围文档包含ID的文档信息
GET /my_goods_20210423/_search
{
"query": {
"ids" : {
"values" : ["1", "4", "5"]
}
}
}
- prefix查询
返回在提供的字段中包含特定前缀的文档
GET /my_shop_test/_search
{
"query": {
"prefix": {
"shopName": {
"value": "bo"
}
}
}
}
#返回
"hits" : [
{
"_index" : "my_shop_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"shopName" : "box",
"shopCode" : "Smith"
}
},
{
"_index" : "my_shop_test",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"shopName" : "booex",
"shopCode" : "act"
}
}
]
- range查询
rand查询类似数据库中的 大于、小于范围查询
GET my_goods_20210423/_search
{
"query": {
"range": {
"publicPrice": {
"gte": 2000,
"lte": 8488
}
}
}
}
- gt :大于
- gte:大于等于
- lt:小于
- lte:小于等于
- regexp查询
正则表达式查询,查询店铺编码以's'开头,中间包括任何字符以及长度并且以'1'结尾的数据
GET my_goods_20210423/_search
{
"query": {
"regexp": {
"shopCode": {
"value": "s.*1",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
- term查询
#返回确切的文档内容,避免对text字段类型使用term
GET my_goods_20210423/_search
{
"query": {
"term": {
"brandName": {
"value": "三星",
"boost": 1.0
}
}
}
}
- terms查询
terms返回一个或多个包含精确查询条件的文档信息
GET /my_goods_20210423/_search
{
"query": {
"terms": {
"brandName": [ "美国", "三星" ],
"boost": 1.0
}
}
}
- terms_set查询
返回最小精确匹配成功的文档信息,terms_set类似terms 查询,只不过terms_se多定义了返回最小匹配的数量
#新定义商品信息
PUT /my_goods_info
{
"mappings": {
"properties": {
"goodsName": {
"type": "keyword"
},
"sale_property": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}
#添加3条商品测试数据
#销售属性 白色、64G、标品
PUT /my_goods_info/_doc/1?refresh
{
"name": "apple",
"sale_property": [ "white", "64","standard" ],
"required_matches": 2
}
#黑色、32G、非标品
PUT /my_goods_info/_doc/2?refresh
{
"name": "apple",
"sale_property": [ "black", "32","no standard" ],
"required_matches": 2
}
#黑色、64 非标品
PUT /my_goods_info/_doc/3?refresh
{
"name": "apple",
"sale_property": [ "black", "64","no standard" ],
"required_matches": 2
}
#查询
GET /my_goods_info/_search
{
"query": {
"terms_set": {
"sale_property": {
"terms": [ "white", "64"],
"minimum_should_match_field": "required_matches"
}
}
}
}
#返回
"hits" : [
{
"_index" : "my_goods_info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1149836,
"_source" : {
"name" : "apple",
"sale_property" : [
"white",
"64",
"standard"
],
"required_matches" : 2
}
}
]
- wildcard查询
返回包含与通配符模式匹配的术语的文档
#返回
GET /my_goods_20210423/_search
{
"query": {
"wildcard": {
"shopCode": {
"value": "sc*1",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}