Hacker News 文章热度排序算法较为简单有效,使用 点赞数 点踩数 和 发布时间 三个维度来衡量一篇文章的热度值。
对应的Python代码如下:
# Rewritten code from /r2/r2/lib/db/_sorts.pyx
from datetime import datetime, timedelta
from math import log
epoch = datetime(1970, 1, 1)
def epoch_seconds(date):
td = date - epoch
return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)
def score(ups, downs):
return ups - downs
def hot(ups, downs, date):
s = score(ups, downs)
order = log(max(abs(s), 1), 10)
sign = 1 if s > 0 else -1 if s < 0 else 0
seconds = epoch_seconds(date) - 1134028003
return round(sign * order + seconds / 45000, 7)
下面是针对smzdm商品排序(http://39.106.99.186/)修改的
from math import log
epoch = datetime(1970, 1, 1)
def epoch_seconds(date):
td = date - epoch
return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)
def score(ups, downs):
return ups - downs/2 # 踩数太多,这里只取一半
def hot(ups, downs, date):
s = score(ups-1, downs) # -1剔除发布者自己的赞
order = log(max(abs(s), 1), 10)
sign = 1 if s > 0 else -1 if s < 0 else 0
seconds = epoch_seconds(date) - 1134028003 # 2005-12-08 15:46:43之后
# w表示: 当s=10时 相当于加了w个小时
w = 1.25
return round(sign * order + seconds / (3600*w), 7)
更精细的方法,可以看下Reddit的排序算法。
参考: https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9