综合排序:function score query优化算分
算分和排序
- elasticsearch默认会议文档的相关度算分进行排序
- 可以通过制定一个或者多个字段进行排序
- 使用县官渡算分(_score)排序,不能满足某些特定条件
- 无法针对相关度,堆排序实现更多的控制
function score query
- function score query
- 可以在查询结束后,对每一个匹配的文档进行一系列的重新算分,根据新生成的分数进行排序
- 提供了几种默认的计算分值的函数
- weight:为每一个文档设置一个简单而不被规范化的权重
- field value factor:使用该数值来修改_score,例如将“热度”和“点赞数”作为算分的参考因素
- random score:为每一个用户使用一个不同的,随机算分的结果
- 衰减函数:以某个字段的值为标准,距离某个值越近,得分越高
- script score:自定义脚本完全控制所需逻辑
DELETE blogs
PUT /blogs/_doc/1
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 0
}
PUT /blogs/_doc/2
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 100
}
PUT /blogs/_doc/3
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 1000000
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes"
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p"
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 3
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"random_score": {
"seed": 911119
}
}
}
}
term&phrase suggester
什么是搜索建议
- 现代搜索引擎,一般会提供suggest as you type的功能
- 帮助用户在输入搜索的过程中,进行自动补全或者纠错。通过协助用户输入更加精准的关键词,提高后续搜索阶段文档匹配的程度
- 在google上搜索,一开始会自动补全,当输入到一定长度,如因为单词拼写错误无法补全,就会开始提示相似的词或者句子
elasticsearch suggester api
- 搜索引擎中类似的功能,在elasticsearch中是通过suggester api实现的
- 原理:将输入的文本分解为token,然后再索引的字典里查找相似的term并返回
- 根据不同的使用场景,elasticsearch设计了4种类别的suggesters
- term&suggester
- complete&context suggester
term suggester
- suggester是一种特殊类型的搜索。“text”里是条用的时候提供的文本,通常来自于用户界面上用户输入的内容
- 用户输入的“lucen”是一个错误的拼写
-
会到指定的“body”上搜索,当无法搜索到结果时,建议返回的值
term suggester - missing mode
- 搜索“lucen rock”
- 每个建议都包含了一个算分,相似性是通过levenshtein edit distance的算法实现的。核心思想就是一个词改动多少个字符就可以和灵台一个词一致。提供了很多可选参数来控制相似性的模糊程度。例如“max_edits”
- 几种suggestion mode
- missing - 如索引中已经存在,就不提供建议
- popular - 推荐出现频率更加高的词
-
always - 无论是否存在,都提供建议
phrase suggester
- phrase suggester在term suggester上增加了一些额外的逻辑
- 一些参数
- suggest mode:missing,popular,always
- max errors:最多可以拼错的terms数
- confidence:限制返回的结果数,默认为1
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
POST _analyze
{
"analyzer": "standard",
"text": ["Elk stack rocks rock"]
}
POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "always",
"field": "body",
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen hocks",
"term": {
"suggest_mode": "always",
"field": "body",
"prefix_length":0,
"sort": "frequency"
}
}
}
}
POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
自动补全与机遇上下文的提示
the completion suggester
- completion suggester 提供了自动完成auto complete的功能,用户每输入一个字符,就需要即时发送一个查询请求到后端查找匹配项
- 对性能要求比较苛刻。elasticsearch采用了不同的数据结构,并非通过倒排索引来完成。而是将analuze的数据编码成fst和索引一起存放。fst会被es整个加载进内存,速度很快。
- fst只能用户前缀查找
使用completion suggester的一些步骤
- 定义mapping,使用“completion”type
- 索引数据
-
运行“suggest”查询,得到搜索建议
什么是context suggester
- completion suggester的拓展
- 可以在搜索中加入更多的上下文信息,例如:“star”
- 咖啡相关:建议“starbucks”
- 电影相关:“star wars”
实现context suggester
- 可以定义两种类型的context
- category - 任意的字符串
- geo - 地理位置信息
- 实现contest suggester的具体步骤
- 定制一个mapping
- 索引数据,并且为每个文档加入context信息
- 结合context进行suggestion查询
精准度和召回率
- 精准度
- completion > phrase > term
- 召回率
- term > phrase > completion
- 性能
- completion > phrase > term
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}
DELETE comments
PUT comments
PUT comments/_mapping
{
"properties": {
"comment_autocomplete":{
"type": "completion",
"contexts":[{
"type":"category",
"name":"comment_category"
}]
}
}
}
POST comments/_doc
{
"comment":"I love the star war movies",
"comment_autocomplete":{
"input":["star wars"],
"contexts":{
"comment_category":"movies"
}
}
}
POST comments/_doc
{
"comment":"Where can I find a Starbucks",
"comment_autocomplete":{
"input":["starbucks"],
"contexts":{
"comment_category":"coffee"
}
}
}
POST comments/_search
{
"suggest": {
"MY_SUGGESTION": {
"prefix": "sta",
"completion":{
"field":"comment_autocomplete",
"contexts":{
"comment_category":"coffee"
}
}
}
}
}