登录注册写文章

IR-chapter6: sorting, term weighting and the vector space model

IR-chapter6: sorting, term weighting and the vector space model

motivation: to rank-order the documents matching a query by giving a score to each (query,document) pair

parametric and zone indexes

index and retrieve documents by metadata.
parametric index vs zone index: fixed vocabulary, whatever vocabulary from the text of that zone.

parametric search

zone index

zone index

weighted zone scoring
learning weights
the optimal weight g
machine learning algorithm

term frequency and weighting

intuition: scores relate to term frequency, but are all words equally important?
free text query: document - the set of weights, bag of words model
score = the sum of all terms
inverse document frequency
tf-idf weighting
terms with lower document frequency weigh higher

tf-idf

the vector space for scoring

dot products : similarity between two documents
the magnitude of the vector difference? the effect of document length.

cosine similarity

length-normalize cosine similarity

query as vectors
computation is expensive
computing vector scores

basic algorithm

Variant tf–idf functions

SMART notation for tf–idf variants.

Pivoted normalized document length
the relationship between document length and relevance

Pivoted normalized document length

linear model
machine learning techniques

最后编辑于：2017.12.07 01:03:54

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

诗和远方
希望的时光总是被结局打击的毫无保留，自然地温度成为了习惯；生活的节奏依旧把理想忘记的一无是处，了却的冲动宠爱...
七月无伤阅读 99评论 0赞 0
宝宝？听说你睡不着？
关于熬夜会给人带来的危害，其实我们都耳熟能详了。轻的：黑眼圈、痘痘、皮肤粗糙暗沉、耳鸣、掉发、抵抗力下降、神经衰...
我是要成为海贼王的大饭子阅读 298评论 0赞 0
我们都在试着长大，然后遍体鳞伤。
【友情】以前最不喜欢自己呆着。总是习惯有人陪着去干这个，干那个。一个人走在校园里，感觉所有人都在嘲笑孤单的自...
曲小叶阅读 622评论 0赞 0
（转）人生不较劲
源自周国平先生的文集 1. 分清自己能否支配人生智慧的一个重要方面，是分清什么是自己能够支配的，什么是自己不能支...
我家门前有大海阅读 388评论 0赞 0

赞1赞

赞赏

手机看全文