场景:单字符串多字段查询
该场景下BoolQuery查询的特点:
BoolQuery实例:
PUT /blogs/_doc/1
{
"title": "Quick brown rabbits",
"body": "Brown rabbits are commonly seen."
}
PUT /blogs/_doc/2
{
"title": "Keeping pets healthy",
"body": "My quick brown fox eats rabbits on a regular basis."
}
# 查询1 bool查询
POST /blogs/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
查询1的result
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.90425634,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.90425634,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
}
问题:为什么文档2中body字段匹配度那么高,却排在了文档1的后面呢?
我们可以在查询1中加上"explain": true
,再次查询,可以看到 "description" : "sum of:",不难看出这里的BoolQuery是将_score进行叠加(求和),_score总和大的权重更高,假如我们想要匹配度更高的排在前面(权重更大)怎么办呢?
Disjunction Max Query是个不错的选择。
- 将任何与一查询匹配的文档作为结果返回。采用字段上最匹配的评分(评分最高的分数)作为最终评分返回。
Disjunction Max Query
Disjunction Max Query实例:
# 查询2 Disjunction Max Query
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"title": "Brown fox"
}
},
{
"match": {
"body": "Brown fox"
}
}
]
}
}
}
查询2的result
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.77041256,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}
我们通过"explain": true
,可以看到采取的策略是:"description" : "max of:",取最大评分作为最终的评分。
通过Tie Breaker 调整评分
实例
#查询3
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
]
}
}
}
查询3的result
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931471,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
}
我们可以看到,这样的检索条件两个文档的评分一样。我们加上 "tie_breaker": 0.2
,再查询看一下效果。
#查询4
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
查询4的result
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.815141,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.815141,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}
tie breaker作用:
- 获得最佳匹配语句的评分;
- 将其他匹配语句的评分与tie_breaker相乘(实例中tie_breaker=0.2,也就是
最佳评分字段的评分+0.2*其他非最佳字段的评分
的和为最终评分); - 对以上评分进行规范化;
tie breaker是一个0~1的值,0代表使用最佳匹配,1代表所有语句同等重要;
总结
在单个字符串对多个字段的查询,如果使用should时:
- BoolQuery是将多个字段的匹配_score进行求和再排序;
- Disjunction Max Query是将最高匹配度的字段(_score最高)评分作为排序的_score;
- 当我们需要在Disjunction Max Query排序时,引入一些其他字段作为评分时,我们可以用tie breaker让其他的字段算分时候也作为计分项,tie breaker的大小可以根据想要的排序效果进行调整。
如有错误,欢迎指正!感谢!