spark
问题描述
在商品推荐的业务逻辑计算时,遇到一个计算商品偏好权重的问题:实时权重要和离线权重结合,其中key相同的商品权重求和,不相同的保留实时和离线的权重不变,由于业务代码是实时计算,只能用scala处理(不会java = =|)具体数据如下:
时间节点1:
bean_json:online_rating value=C01:B02:12.00#C01:B01:8.00
bean_json:real_time_rating value=C01:B01:40.00:100#C01:B02:60.00:100
时间节点2:
bean_json:online_rating value=C01:B02:9.00#C01:B04:8.00#C01:B01:8.00#C01:B03:6.00
bean_json:real_time_rating value=C01:B03:30.00:100#C01:B04:40.00:100#C01:B02:30.00:100
业务计算逻辑
在时间节点1时
商品的实时权重为:
C01:B01 = 40.00
C01:B02 = 60.00
离线权重 = 实时权重*衰减系数0.2
C01:B02 = 60*0.2 = 12.00
C01:B01 = 40*0.2 = 8.00
时间节点2时
新的实时权重为:
C01:B03 = 30.00
C01:B04 = 40.00
C01:B02 = 30.00
新的离线权重:
C01:B02 = (12.00 + 30*0.2)/2 = 9.00
C01:B04 = 40*0.2 = 8.00
C01:B01 = 8.00
C01:B03 = 30*0.2 = 6.00
摸索了很久终于在这篇文章找到了方法:scala 两个map合并,key相同时value相加
主要原理是将两个权重数据转成Map[key,value],使用函数getOrElse将两个Map合并时,做有条件的处理,条件就是两个Map的key是否相同,相同就做计算,不相同则直接保留。
一行代码即可得到结果,代码如下:
val unionMap = map1 ++ map2.map(t => t._1 -> (t._2 + map1.getOrElse(t._1, t._2))/2) //key相同权重求和除2
具体代码
/**
* 计算新的权重
* @param realTimeWeight 实时权重
* @param onlineWeight 离线权重
* @return 新权重
*/
def dealNewOnRating2(realTimeWeight:String,onlineWeight:String): String = {
var newWeightArray = new StringBuilder
val realTimeWeightRdd: Array[(String, Double)] = realTimeWeight.split("#").map(w=>(w,1)).map(x =>{
val realTimeInfo = x._1.split(":")
val cateInfo = realTimeInfo(0)+":"+realTimeInfo(1)
val rtWeight = realTimeInfo(2).toDouble * 0.2
(cateInfo,rtWeight)
})
val onlineWeightRdd: Array[(String, Double)] = onlineWeight.split("#").map(w=>(w,1)).map(x =>{
val realTimeInfo = x._1.split(":")
val cateInfo = realTimeInfo(0)+":"+realTimeInfo(1)
val onWeight = realTimeInfo(2).toDouble
(cateInfo,onWeight)
})
val map1 = realTimeWeightRdd.toMap
val map2 = onlineWeightRdd.toMap
val unionMap = map1 ++ map2.map(t => t._1 -> (t._2 + map1.getOrElse(t._1, t._2))/2) //key相同权重求和除2
val unionWeight = unionMap.toArray.sortBy(_._2)(Ordering[Double].reverse).take(10) //排序取top10
for (i <- unionWeight.indices) {
val cateInfo = unionWeight(i)._1
val newWeight = unionWeight(i)._2.formatted("%.2f").toString
newWeightArray ++= cateInfo++=":"++=newWeight++="#"
}
newWeightArray.toString
}