IR-chapter7: computing scores in a complete search system

efficient scoring and ranking

FastCosineScore
  • constructing a heap to pick out top K components
  • Inexact top K document retrieval
  • index elimination
    considering documents containing terms whose idf exceeds a preset threshold
    considering documents containing many(even all) terms
  • champion list
    precompute r documents with the highest weights for each term.
    r does not to be the same for every term.(rarer term, larger)
  • static quality scoring and ordering
    net-score
    global champion list, expansion two lists sorted by g(d) value
  • impact ordering
    sorted by common ordering: document-at-a-time scoring
    sorted by uncommon ordering: term-at-a-time scoring
    ordered by a decreasing tf value,advantage:
    1.stop after considering a prefix of posting list
    2.consedering query terms in decreasing order of idf.
  • cluster pruning
    pick ,compute nearest, cluster, computing cosine similarity from q to each leader, then the closest L and its follower
    variation - b1,b2

components of an information retrieval system

  • tiered indexes
    motivation: A has fewer than K documents
    solution: we set a tf threshold of 20 for tier 1 and 10 for tier 2, meaning that the tier 1 index only has postings entries with tf values exceeding 20, and the tier 2 index only has postings entries with tf values exceeding 10.
tiered indexes
  • designing parsing and scoring function
    query parser - translate the user-specified keywords into a query with various operators
    scoring function - manual configuration or machine-learned scoring

  • putting it all together

a complete search system

results snippets: snippets of text accompanying each document in the results list for a query.

Vector space scoring and query operator interaction

Google: the semantics of a conjunctive query that only retrieves documents containing all or most query terms.

  • Boolean retrieval
  • wildcard queries
  • phase queries
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,916评论 0 23
  • 旷野上枯死的草芥 火依旧在蔓延 再无一丝生机 一点都没有 唯有最后一点从眼角渗出的水分 浸润着 这枯荣 然而 那不...
    予辰同学阅读 503评论 0 4
  • 盖茨推荐的书 : 蒂芬·平克《人性中的善良天使》 集装箱改变世界 亚洲大趋势 21世纪资本论 自然的魔法 那些古怪...
    savvyisme阅读 122评论 0 0