1、Elasticsearch打分机制

Lucene评分公式

tf词频
idf逆文档，1/词汇在多少个文档中出现
归一化因子
提升权重

2、boosting

索引或查询文档的时候，可以提升一篇文档的得分

索引期间boosting

curl -XPUT 'localhost:9200/get-togher' -d'{
  "mappings":{
    "group":{
      "properties":{
        "name":{
          "boost":2.0,
          "type":"string"
        }
      }
    }
  }
}'

查询期间boosting

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "bool":{
      "should":[{
        "match":{
          "description":{
            "query":"elasticsearch big data",
            "boost":2.5
          }
        }
      }
      ]
    }
  }

3、文档是如何被评分的

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "match":{
      "description":"elasticsearch"
    }
  },
  "explain":true
}'

图片.png

4、使用查询再打分减小评分对性能的影响

再打分意味着在初识的查询运行后，针对返回的结果集合进行第二轮得分计算

再打分设置

5、function_score定制得分

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "function_score":{
      "query":{
        "match":{
          "description":"elasticsearch big data"
        }
      },
      "funcitons":[]
    }
  }

weight

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "function_score":{
      "query":{
        "match":{
          "description":"elasticsearch big data"
        }
      },
      "funcitons":[
      {
        "weight":1.5,
        "filter":{"term":{"description":"hadoop"}}
      }
      ]
    }
  }

field_value_factor函数

使用文档中的数据来影响文档得分

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "function_score":{
      "query":{
        "match":{
          "description":"elasticsearch big data"
        }
      },
      "funcitons":[
      {
        "field_value_factor":{
          "field":"reviews",
          "factor":2.5,
          "modifier":"ln'
        }
      }
      ]
    }
  }

脚本

可以写脚本来计算得分

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "function_score":{
      "query":{
        "match":{
          "description":"elasticsearch big data"
        }
      },
      "funcitons":[
      {
        "script_score":{
          "script":"Math.log(doc['attendees'].values.size()*myweight)",
          "params":{"myweight":3}
        }
      }
      ],
      "boost_mode":"replace"
    }
  }

衰减函数

允许根据某个字段，应用一个逐步衰减的文档得分

7、使用脚本排序

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d '{
  "query":{
    "match":{
      "description":"elasticsearch big data"
    }
  }，
  "sort":[
  {
    "_script":{
      "script":"doc['attendees'].values.size()",
      "type":"number",
      "order":"desc"
    }
  },
  "_soce"
  ]
  }'

8、字段数据

当需要在某个字段上进行排序或是返回一些聚集时，ES需要快速决定，对于每个文档，哪些词条用于排序或聚集。

字段数据缓存

预热器是ES自动运行的查询，以确保内部的缓存被填充，使得查询所用数据在正式使用前被加载

curl -XPOST 'localhost:9200/get-together' -d '{
  "mappings":{
    "group":{
      "properties":{
        "title":{
          "type":"string",
          "fielddata":{"loading":"eager"}
        }
      }
    }
  }
}'

字段数据运用场景

按照某个字段排序
在某个字段聚集
使用doc['xxx']访问值
运用于function_score查询中
在搜索中使用fielddata_fields从字段数据获取内容
缓存父子文档关系的ID

管理字段数据

限制字段数据使用内存量
断路器，设置字段数据大小占JVM占比
使用文档值来避免内存的使用，文档值在文件被索引时，获取了将要加载到内存的数据，并将它们和普通索引数据一起存储到磁盘上。

Elasticsearch实战第六章使用相关性进行搜索

Elasticsearch实战第六章使用相关性进行搜索

1、Elasticsearch打分机制

2、boosting

索引期间boosting

查询期间boosting

3、文档是如何被评分的

4、使用查询再打分减小评分对性能的影响

5、function_score定制得分

weight

field_value_factor函数

脚本

衰减函数

7、使用脚本排序

8、字段数据

字段数据缓存

字段数据运用场景

管理字段数据

推荐阅读更多精彩内容

Elasticsearch实战 第六章 使用相关性进行搜索

1、Elasticsearch打分机制

2、boosting

索引期间boosting

查询期间boosting

3、文档是如何被评分的

4、使用查询再打分 减小评分对性能的影响

5、function_score定制得分

weight

field_value_factor函数

脚本

衰减函数

7、使用脚本排序

8、字段数据

字段数据缓存

字段数据运用场景

管理字段数据

推荐阅读更多精彩内容

Elasticsearch实战第六章使用相关性进行搜索

4、使用查询再打分减小评分对性能的影响